British Columbia
How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis
Higuchi, Rei, Kawata, Ryotaro, Wachi, Akifumi, Takakura, Shokichi, Miyaguchi, Kohei, Suzuki, Taiji
Reward modeling is not only a prediction problem: in KL-regularized policy optimization, the learned reward is exponentiated to define the deployed policy, so downstream value depends on errors in reward-tilted regions. We study this feedback in a Gaussian single-index model with $r^*(x) = σ^*(\langle θ^*, x\rangle)$ and $x \sim N(0, I_d)$. We analyze a two-stage neural reward model that first learns the hidden direction $θ^*$ from reward-weighted samples and then fits the readout layer by weighted ridge regression. Exponential reward weighting changes the Hermite signal available to the first layer; for any feature-learning temperature $β_1$ above a dimension-free $O(1)$ threshold, a constant fraction of neurons recover the hidden direction, with weak-recovery complexity governed by the generative exponent. After feature recovery, we derive tilted-policy value-gap bounds for an idealized label-weighted fit with weights $e^{y/β_2}$ and a more practical surrogate-weighted fit with weights $e^{r_{a_0}(x)/β_2}$. Keeping the $β_2$-dependence explicit yields an admissible set of deployment temperatures, balancing the gain from lowering $β_2$ against the learning cost amplified by exponential weighting; in the surrogate-weighted case, proxy-dependent factors shrink this admissible set.
I Went to See What's Happened to the Home of the TED Talk. It Was a Little Terrifying.
Meanwhile its Audacious Project --a funding initiative that gives mature nonprofits the opportunity to pitch "moonshot" plans to a coalition of philanthropists--has raised over $1 billion in each of the last two years, in an epic Robin Hood operation for a handful of large-scale projects on climate, health, education, and criminal justice: The Audacious recipients here this year are taking this brief break from their work preventing 16 million unsafe abortions, helping governments in 20 countries prevent lead poisoning, or intercepting 5 percent of the world's river-borne plastic before it reaches the ocean.
Canadian officials claim OpenAI violated federal and provincial privacy laws
Philippe Dufresne, the Privacy Commissioner of Canada, has found OpenAI was not compliant with Canadian federal and provincial privacy laws in the training of its AI models. Following an investigation, Dufresne and his counterparts in Alberta, Quebec and British Columbia say OpenAI's approach to things like data collection and consent stepped on multiple laws, including Canada's Personal Information Protection and Electronic Documents Act (PIPEDA), which governs how companies collect and use personal information during the normal course of business. The commissioners participating in the investigation identified multiple privacy issues with OpenAI's approach, including that the company gathered vast amounts of personal information without adequate safeguards to prevent use of that information to train its models, and that it failed to acquire consent to collect and use that personal information in the first place. Warnings in ChatGPT note that interactions with the AI could be used in training, but third-party data OpenAI has purchased or scraped also includes personal details people likely aren't even aware of. The fact that ChatGPT users have no way to access, correct or delete that data was another issue that the commissioners identified, according to a summary of the investigation's findings, along with OpenAI's lackluster attempts to acknowledge the inaccuracy of some of ChatGPT's responses.
Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition
The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors. While the empirical imaging performance and the theoretical convergence properties of these algorithms have been widely investigated, their recovery properties have not previously been theoretically analyzed. We address this gap by showing how to establish theoretical recovery guarantees for PnP/RED by assuming that the solution of these methods lies near the fixedpoints of a deep neural network. We also present numerical results comparing the recovery performance of PnP/RED in compressive sensing against that of recent compressive sensing algorithms based on generative models. Our numerical results suggest that PnP with a pre-trained artifact removal network provides significantly better results compared to the existing state-of-the-art methods.
Families sue OpenAI, alleging chatbot aided in Canadian school shooting
The families of victims of a school shooting in a remote Canadian Rockies town are suing artificial intelligence company OpenAI in a United States federal court, alleging that the ChatGPT maker failed to alert police to the shooter's alarming interactions with the chatbot. A lawsuit filed on Wednesday on behalf of 12-year-old Maya Gebala, who was critically injured in the February shooting, is among the first of more than two dozen cases from families in Tumbler Ridge, British Columbia, in what their lawyers say represents "an entire community stepping forward to hold OpenAI accountable". The cases represent the families of the five slain children targeted in the school shooting. Those include Zoey Benoit, Abel Mwansa Jr, Ticaria "Tiki" Lampert, Kylie Smith, all 12, and Ezekiel Schofield, 13, as well as education assistant Shannda Aviugana-Durand. Jesse Van Rootselaar, whose interactions with ChatGPT are at the centre of the lawsuits, shot her mother and stepbrother at home before killing an educational assistant and five students aged 12 to 13 at her former school on February 10, according to police.
Victims Allege OpenAI Is Responsible for Mass Shooting
A new lawsuit underscores key questions about the Tumbler Ridge killer's use of ChatGPT. A community vigil in Tumbler Ridge two days after the rural community experienced one of Canada's deadliest shootings Paige Taylor White/AFP/Getty Get your news from a source that's not owned and controlled by oligarchs. Victims of the Tumbler Ridge mass shooting and their families sued OpenAI and its CEO, Sam Altman, in US district court in San Francisco on Wednesday, claiming various negligence, product liability, and other violations. The civil complaints are the latest in a wave of litigation against OpenAI alleging that its globally popular chatbot, ChatGPT, helped people commit lethal violence. The complaints were filed by families of multiple victims wounded and killed at Tumbler Ridge Secondary School in British Columbia, Canada, where a suicidal 18-year-old opened fire on February 10.
Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)
Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such as GANs. There is valuable information in those patterns that machine learning models can use to better classify, predict or perform other downstream tasks. We propose a novel framework that takes time series' common origin into account and favors channel/feature relationships preservation. The two key points of our method are: 1) the individual time series are generated from a common point in latent space and 2) a central discriminator favors the preservation of inter-channel/feature dynamics. We demonstrate empirically that our method helps preserve channel/feature correlations and that our synthetic data performs very well in downstream tasks with medical and financial data.
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
Arthur da Cunha, Université Côte d'Azur, Inria, CNRS, I3S, Aarhus University, Aarhus, Denmark, dac@cs.au.dk, "3026 Francesco d'Amore, Aalto University, Bocconi University, Espoo, Finland, francesco.damore@aalto.fi "3026 Emanuele Natale, Université Côte d'Azur, Inria, CNRS, I3S, Sophia Antipolis, France, emanuele.natale@inria.fr
Learning State Representations from Random Deep Action-conditional Predictions
Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions--random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon--form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-conditional predictions when used as auxiliary tasks yield state representations that produce control performance competitive with state-of-the-art hand-crafted auxiliary tasks like value prediction, pixel control, and CURL in both Atari and DeepMind Lab tasks. In another set of experiments we stop the gradients from the RL part of the network to the state representation learning part of the network and show, perhaps surprisingly, that the auxiliary tasks alone are sufficient to learn state representations good enough to outperform an end-to-end trained actor-critic baseline.